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In this paper, we propose a distributed algorithm for solving loosely coupled problems with chordal 
sparsity which relies on primal-dual interior-point methods. We achieve this by distributing the 
computations at each iteration, using message-passing. In comparison to already existing distributed 
algorithms for solving such problems, this algorithm requires far less number of iterations to converge 
to a solution with high accuracy. Furthermore, it is possible to compute an upper-bound for the 
number of required iterations which, unlike already existing methods, only depends on the coupling 
structure in the problem. We illustrate the performance of our proposed method using a set of 
numerical examples. 
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1. Introduction 

Centralized algorithms for solving optimization problems rely on the existence of a 
central computational unit powerful enough to solve the problem in a timely manner, 
and they render useless in case we lack such a unit. Also such algorithms become unviable 
when it is impossible to form the problem in a centralized manner, for instance due 
to structural constraints including privacy requirements. In cases like these, distributed 
optimization algorithms are the only resort for solving optimization problems, e.g., see [5, 
7, 13, 26, 27]. In this paper we are interested in devising efficient distributed algorithms 
for solving convex optimization problems in the form 


minimize f\ ( x ) + • • • 

+ In(x) 
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subject to G l {x) A 0, 
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where /* : M n —> M, G l : R n —» M m % A 1 £ M PiXn with YliLi Pi < n and rank(A) = p i 
for all i = 1 ,,N. Here A denotes the component-wise inequality. This problem can 
be seen as a combination of N coupled subproblems, each of which is defined by an ob¬ 
jective function f t and by constraints that are expressed by G l and matrices A 1 and b l . 
Furthermore, we assume that these subproblems are only dependent on a few elements 
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of x, and that they are loosely coupled. The structure in such problems is a form of 
partial separability, which implies that the Hessian of the problem is sparse, see e.g., 
[32] and references therein. Existing distributed algorithms for solving (1), commonly 
solve the problem using a computational network with N computational agents, each 
of which is associated with its own local subproblem. The graph describing this compu¬ 
tational network has the node set V = {1,..., N} with an edge between any two nodes 
in case they need to communicate with one another. The existence of an edge also in¬ 
dicates existence of coupling among the subproblems associated to neighboring agents. 
This graph is referred to as the computational graph of the algorithm and matches the 
coupling structure in the problem, which enables us to solve the problem distributedly 
while providing complete privacy among the agents. 

Among different algorithms for solving problems like (1) distributedly, the ones based 
on first order methods are among the simplest ones. These algorithms are devised by 
applying gradient/subgradient or proximal point methods to the problem or an equiv¬ 
alent reformulations of it, see e.g., [5, 7, 10, 13, 26, 27]. In this class, algorithms that 
are based on gradient or subgradient methods, commonly require simple local compu¬ 
tations. However, they are extremely sensitive to the scaling of the problem, see e.g., 
[26, 27]. Algorithms based on proximal point methods alleviate the scaling sensitivity 
issue, see e.g., [5, 7, 10, 13], but this comes at a price of more demanding local com¬ 
putations and/or more sophisticated communication protocols among agents, see e.g., 
[14, 15, 28, 31], 

Despite the effectiveness of this class of algorithms, they generally still require many 
iterations to converge to an accurate solution. In order to improve the convergence prop¬ 
erties of the aforementioned algorithms, there has recently been a surge of interest in 
devising distributed algorithms using second order methods, see e.g., [1, 9, 20, 25, 34], 
In [25], the authors propose a distributed optimization algorithm based on a Lagrangian 
dual decomposition technique which enables them to use second order information of 
the dual function to update the dual variables within a dual interior-point framework. 
To this end, at each iteration, every agent solves a constrained optimization problem 
for updating local primal variables and then communicates with all the other agents 
to attain the necessary information for updating the dual variables. This level of com¬ 
munication is necessary due to the coupling in the considered optimization problem. 
The authors in [34] present a distributed Newton method for solving a network utility 
maximization problem. The proposed method relies on the special structure in the prob¬ 
lem, which is that the objective function is given as a summation of several decoupled 
terms, each of which depends on a single variable. This enables them to utilize a certain 
matrix splitting method for computing Newton directions distributedly. In [1, 20] the 
authors put forth distributed primal and primal-dual interior-point methods that rely on 
proximal splitting methods, particularly ADMM, for solving for primal and primal-dual 
directions, distributedly. This then allows them to propose distributed implementations 
of their respective interior-point methods. One of the major advantages of the proposed 
algorithms in [1, 20, 34] lies in the fact that the required local computations are very 
simple. These approaches are based on inexact computations of the search directions, 
and they rely on first order or proximal methods to compute these directions. Generally 
the number of required iterations to compute the directions depends on the desired ac¬ 
curacy, and in case they require high accuracy for the computed directions, this number 
can grow very large. This means that commonly the computed directions using these 
algorithms are not accurate, and particularly the agents only have approximate con¬ 
sensus over the computed directions. This inaccuracy of the computed directions can 
also sometimes adversely affect the number of total primal or primal-dual iterations for 
solving the problem. 

In this paper we propose a distributed primal-dual interior-point method and we 
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evade the aforementioned issues by investigating another distributed approach to solve 
for primal-dual directions. To this end we borrow ideas from so-called message-passing 
algorithms for exact inference over probabilistic graphical models, [21, 29]. In this class of 
inference methods, message-passing algorithms are closely related to non-serial dynamic 
programming, see e.g., [3, 21, 24, 33]. Non-serial dynamic programming techniques, un¬ 
like serial dynamic programming, [4], that are used for solving problems with chain-like 
or serial coupling structure, are used to solve problems with general coupling structure. 
Specifically, a class of non-serial dynamic programming techniques utilize a tree repre¬ 
sentation of the coupling in the problem and use similar ideas as in serial techniques to 
solve the problem efficiently, see e.g., [3, 24, 30, 33]. We here also use a similar approach 
for computing the primal-dual directions. As we will see later, this enables us to de¬ 
vise distributed algorithms, that unlike the previous ones compute the exact directions 
within a finite number of iterations. In fact, this number can be computed a priori, and 
it only depends on the coupling structure in the problem. Unfortunately these advan¬ 
tages come at a cost. Particularly, these algorithms can only be efficiently applied to 
problems that are sufficiently sparse. Furthermore, for these algorithms the computa¬ 
tional graphs can differ from the coupling structure of the problem, and hence they can 
only provide partial privacy among the agents. The approach presented in this paper is 
also closely related to multi-frontal factorization techniques for sparse matrices, e.g., see 
[2, 12, 22], In fact we will show that the message-passing framework can be construed as 
a distributed multi-frontal factorization method using fixed pivoting for certain sparse 
symmetric indefinite matrices. To the best knowledge of the authors the closest approach 
to the one put forth in this paper is the work presented in [18, 19]. The authors for these 
papers, propose an efficient primal-dual interior-point method for solving problems with 
a so-called nested block structure. Specifically, by exploiting this structure, they present 
an efficient way for computing primal-dual directions by taking advantage of parallel 
computations when computing factorization of the coefficient matrix in the augmented 
system at each iteration. In this paper, we consider a more general coupling structure 
and focus on devising a distributed algorithm for computing the search directions, and 
we provide assurances that this can be done even when each agent has a limited access 
to information regarding the problem, due to privacy constraints. 


Outline 

Next we first define some of the common notations used in this paper, and in Section 2 
we put forth a general description of coupled optimization problems and describe math¬ 
ematical and graphical ways to express the coupling in the problem. In Section 3 we 
review some concepts related to chordal graphs. These are then used in Section 4 to 
describe distributed optimization algorithms based on message-passing for solving cou¬ 
pled optimization problems. We briefly describe the primal-dual interior-point method 
in Section 5. In Section 6, we first provide a formal mathematical description for loosely 
coupled problems and then we show how primal-dual methods can be applied in a 
distributed fashion for solving loosely coupled problems. Furthermore, in this section 
we discuss how the message-passing framework is related to multi-frontal factorization 
techniques. We test the performance of the algorithm using a numerical example in 
Section 7, and finish the paper with some concluding remarks in Section 8. 


Notation 

We denote by M the set of real scalars and by M nxm the set of real n x m matrices. 
With 1 we denote a column vector of all ones. The set of n x n symmetric matrices are 
represented by § n . The transpose of a matrix A is denoted by A T and the column and 
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null space of this matrix is denoted by C(A) and J\f(A), respectively. We denote the set 
of positive integers {1,2,. .. ,p} with N p . Given a set J C N n , the matrix Ej € Rl J l xn 
is the 0-1 matrix that is obtained by deleting the rows indexed by N n \ J from an 
identity matrix of order n, where \J\ denotes the number of elements in set J. This 

means that Ejx is a | J\- dimensional vector with the components of x that correspond 

i (k) 

to the elements in J, and we denote this vector with xj. With x^ ' we denote the Ith 
element of vector x l at the fcth iteration. Also given vectors x* for i = 1 ,N, the 
column vector (x 1 ,..., x N ) is all of the given vectors stacked. 


2. Coupled Optimization Problems 

Consider the following convex optimization problem 

minimize FAx) + ■ ■ ■ + Fjv(x), (2) 

X 

where Fj : R n —>• R for all i = 1,. .. ,N. We assume that each function Fi is only 
dependent on a small subset of elements of x. Particularly, let us denote the ordered set 
of these indices by J* C N n . We also denote the ordered set of indices of functions that 
depend on x* with 2} = {k \ i € </&} C With this description of coupling within the 
problem, we can now rewrite the problem in ( 2 ), as 

minimize F\(Ej x) H-b F n (E Jn x), (3) 

X 

where Ej i is a 0-1 matrix that is obtained from an identity matrix of order n by deleting 
the rows indexed by N n \ J;. The functions F : Rl^‘1 —> R are lower dimensional 
descriptions of F*s such that Fj(x) = Fi(Ej i x) for all x € R n and i = 1, ...,1V. For 
instance consider the following optimization problem 

minimize Fi (x) + F 2 (x) + F 3 (x) + F 4 (x) + F 5 (x) + Fq (x) , (4) 

X 

and let us assume that x € R 8 , J\ = { 1 ,3}, J 2 = {1, 2 ,4}, J 3 = {4, 5}, J 4 = {3,4}, J 5 = 
{3,6,7} and Jq = {3,8}. With this dependency description we then have 2} = {1,2}, 
1 2 = { 2 }, 1 3 = {1,4, 5, 6 }, Z 4 = {2,3,4}, Z 5 = {3}, Z 6 = {5}, 1 7 = {5} and Z 8 = { 6 }. 
This problem can then be written in the same format as in (3) as 


minimize Fi(xi, x 3 ) + F 2 (x 1 , x 2 , x 4 )+ 

X 

F 3 {x 4 , x 5 ) + F 4 (x 3, X 4 ) + ^5(^3, X 6 , x 7 ) + F 6 (x 3 ,x 8 ). ( 5 ) 

The formulation of coupled problems as in (3) enables us to get a more clear picture of 
the coupling in the problem. Next we describe how the coupling structure in (2) can be 
expressed graphically using undirected graphs. 


2.1. Coupling and Sparsity Graphs 

A graph G is specified by its vertex and edge sets V and £, respectively. The coupling 
structure in ( 2 ) can be described using an undirected graph with node or vertex set 
V c = {1,..., N} and the edge set £ c with (i,j) € £ c if and only if J; n Jj / 0. We refer 
to this graph, G c , as the coupling graph of the problem. Notice that all sets T t induce 
complete subgraphs on the coupling graph of the problem. Another graph that sheds 
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Figure 1 . The sparsity and coupling graphs for the problem in (5). 


more light on the coupling structure of the problem is the so-called sparsity graph , 
G s , of the problem. This graph is also undirected, though with node or vertex set 
V s = {1,... ,n} and the edge set £ s with (i,j) € £ s if and only if Z; n Tj / 0. Similarly, 
all sets Ji induce complete subgraphs on the sparsity graph of the problem. Let us now 
reconsider the example in (5). The sparsity and coupling graphs for this problem are 
illustrated in Figure 1, on the left and right respectively. It can then be verified that ah 
J iS and ZjS induce complete graphs over coupling and sparsity graphs, respectively. 

As we will see later graph representations of the coupling structure in problems play 
an important role in designing distributed algorithms for solving coupled problems and 
gaining insight regarding their distributed implementations. Specifically, chordal graphs 
and their characteristics play a major role in the design of our proposed algorithm. This 
is the topic of the next section. 


3. Chordal Graphs 

A graph G(V,£) with vertex set V and edge set £ is chordal if every of its cycles of 
length at least four has a chord, where a chord is an edge between two non-consecutive 
vertices in a cycle, [17, Ch. 4], A clique of G is a maximal subset of V that induces a 
complete subgraph on G. Consequently, no clique of G is entirely contained in any other 
clique, [6]. Let us denote the set of cliques of G as C q = {Ci,... , C q }. There exists a 
tree defined on Cq such that for every Q, Cj £ C q with i ^ j, CidCj is contained in all 
the cliques in the path connecting the two cliques in the tree. This property is called the 
clique intersection property, and trees with this property are referred to as clique trees. 
For instance the graph on the left in Figure 1 is chordal and has five cliques, namely 
Ci = {1,2,4}, C 2 = {1,3,4}, C 3 = {4,5}, C 4 = {3,6,7} and C 5 = {3,8}. A clique tree 
over these cliques is given in Figure 2. This tree then satisfies the clique intersection 
property, e.g., notice that C 2 D C 3 = {4} and the only clique in the path between C 2 
and C3, that is C \, also includes {4}. 

Chordal graphs and their corresponding clique trees play a central role in the design 
of the upcoming algorithms. For chordal graphs there are efficient methods for comput¬ 
ing cliques and clique trees. However, the graphs that we will encounter, particularly 
the sparsity graphs, do not have to be chordal. As a result, next and for the sake of 
completeness we first review simple heuristic methods to compute a chordal embedding 
of such graphs, where a chordal embedding of a graph G{V,£ ) is a chordal graph with 
the same vertex set and an edge set £ e such that £ C £ e . We will also explain how to 
compute its cliques and the corresponding clique tree. 


3.1. Chordal Embedding and Its Cliques 

Greedy search methods are commonly used for computing chordal embeddings of graphs, 
where one such method is presented in Algorithm 1, [11], [21], The graph G with the 
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Algorithm 1 Greedy Search Method for Chordal Embedding 

1 : Given a graph G(V, £) with V = {1,..., ?z}, C G = 0, Vt = V, £ t = £ and flag = 1 

2: repeat 

3: i = vertex in Vt with the smallest number of neighbors based on £ t 

4: Connect all the nodes in Ne(z) to each other and add the newly generated edges to £ t , 

and £ 

5: Ct = {?’} U Ne(z) 

6: £ t =£ t \ {( i,j ) e St | j e Ne(z)} 

7 : V t = V t \ {z} 

8: for k = 1 : I Cc| do 

9: if C t c C G (fc) then 

10 : flag = 0 

11: end if 

12: end for 

13: if flag then 

14: C G = C G U {C t } 

15: end if 

16: flag = 1 

17: until Vt = 0 


Algorithm 2 Maximum Weight Spanning Tree 

1: Given a weighted graph Wlf/w, £\v) with V\y = {1,..., q}, V t = 1 and £ t = 0 

2: repeat 

3: £ = {(z, j) & £w \ i &V t ,j £ V t } 

4: (i,j) = (i,j)€£ with the highest weight 

5: Vt = V t U{j}_ 

6: £ t = £t U {(z,j)} 

7 : until V t = Vw 


returned edge set £ will then be a chordal graph. This algorithm also computes the 
set of cliques of the computed chordal embedding which are returned in the set Cq. 
Notice that Ne(i) in steps 4, 5 and 6 is defined based on the most recent description of 
the sets Vt and £f The criterion used in Step 3 of the algorithm for selecting a vertex 
is the so-called min-degree criterion. There exist other versions of this algorithm that 
utilize other criteria, e.g., min-weight , min-fill and weighted-min-fill. Having computed 
a chordal embedding of the graph and its clique set, we will next review how to compute 
a clique tree over the computed clique set. 


3.2. Clique Trees 

Assume that a set of cliques for a chordal graph G is given as C G = {C\ , C< 2 , ..., C q }. 
In order to compute a clique tree over the clique set we need to first define a weighted 
undirected graph, W, over Vw = {1,... ,q} with edge set £w where (z, j) G £w if and 
only if Ci D Cj / 0, where the assigned weight to this edge is equal to \C{ fl Cf\. A 
clique tree over Cq can be computed by finding any maximum spanning tree of the 
aforementioned weighted graph. This means finding a tree in the graph that contains 
all its nodes and edges with maximal accumulated weight. An algorithm to find such 
a tree is presented in Algorithm 2 , [ 11 ], [ 21 ]. The tree described by the vertex set Vt 
and edge set £t is then a clique tree. We will now discuss distributed optimization using 
message-passing. 
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1) Ci = {1,2,4} 


C 4 = {3,6,7} 



C 2 = {1,3,4} (2 


C 3 = {4,5} 


5) C 5 = {3,8} 


Figure 2. Clique tree for the sparsity graph of the problem (5). 


4. Optimization Over Clique Trees 

In this section, we describe a distributed optimization algorithm based on message¬ 
passing. Particularly, we focus on the building blocks of this algorithm, namely we will 
provide a detailed description of its computational graph, messages exchanged among 
agents, the communication protocol they should follow and how they compute their 
corresponding optimal solutions. The convergence and computational properties of such 
methods, within exact inference over probabilistic graphical models, are extensively 
discussed in [21, Ch. 10, Ch. 13]. For the sake of completeness and future reference, 
we here also review some of these results and provide proofs for these results using the 
unified notation in this paper, in the appendix. 


4.1. Distributed Optimization Using Message-passing 

Consider the optimization problem in (2). Let G S (V S1 £ S ) denote the chordal sparsity 
graph for this problem and let C s = {Ci,..., C q } and T(Vt,£t ) be its set of cliques and 
a corresponding clique tree, respectively. It is possible to devise a distributed algorithm 
for solving this problem that utilizes the clique tree T as its computational graph. 
This means that the nodes Vt = {1,..., q} act as computational agents and collaborate 
with their neighbors that are defined by the edge set £t of the tree. For example, the 
sparsity graph for the problem in (5) has five cliques and a clique tree over these cliques 
is illustrated in Figure 2. This means the problem can be solved distributedly using 
a network of five computational agents, each of which needs to collaborate with its 
neighbors as defined by the edges of the tree, e.g., Agent 2 needs to collaborate with 
agents 1,4,5. 

In order to specify the messages exchanged among these agents, we first assign different 
terms of the objective function in (2) to each agent. A valid assignment in this framework 
is that Fi can only be assigned to agent j if Ji C Cj . We denote the ordered set of indices 
of terms of the objective function assigned to agent j by <f>j. For instance, for the problem 
in (5), assigning F\ and F± to Agent 2 would be a valid assignment since J\, J 4 C C 2 and 
hence fa = {1,4}. Notice that the assignments are not unique and for instance there 
can exist agents j and k with j ^ k so that Ji C Cj and Ji C C\ making assigning F t to 
agents j or k both valid. Also for every term of the objective function there will always 
exist an agent that it can be assigned to, which is proven in the following proposition. 

Proposition 4.1 For each term Fi of the objective function, there always exists a Cj 
for which Ji C Cj . 

Proof Recall that each set Ji induces a complete subgraph on the sparsity graph, G s , of 
the problem. Then by definition of cliques, Ji is either a subset of a clique or is a clique 
of the sparsity graph. ■ 

Before we continue with the rest of the algorithm description, we first need to define 
some new notations that are going to be extensively used in the following. Consider 
Figure 3 which illustrates a clique tree T(Vt,£t) for a given sparsity graph G s . Each 
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Figure 3. Clique tree for a sparsity graph G s . 


node in the tree is associated to a clique of G s and let Wij denote the set of indices of 
cliques that are on the node i-side of edge (■ i,j ) G St- Similarly, Wji denotes the same 
but for the ones on the j-side of (i,j). Also we denote the set of indices of variables in 
the cliques specified by W t j by V t j, i.e., Vij = UfceVF Similarly the set of indices of 
variables in cliques specified by Wji is denoted by Vj{. The set of all indices of objective 
function terms that are assigned to nodes specified by W t] is represented by 3>y, i.e., 
&ij = [j keWij 4>k, and the ones specified by Wji with <Fjj. In order to make the newly 
defined notations more clear, let us reconsider the example in (5) and its corresponding 
clique tree in Figure 2, and let us focus on the (1,2) edge. For this example then 
W 2 i = {2,4,5}, W 12 = {1,3}, V 21 = {1,3,4, 6 , 7,8}, V 12 = { 1 , 2 ,4, 5}, <h 21 = {1,4, 5, 6 } 
and 4 >i 2 = {2,3}. With the notation defined, we will now express the messages that 
are exchanged among neighboring agents. Particularly, let i and j be two neighboring 
agents, then the message sent from agent i to agent j. rriij, is given by 


rriij ( x s ..) = minimum 

x c i \s ij 

where Sij = Ci Cl Cj is the so-called separator set of agents i and j. As a result, for 
agent i to be able to send the correct message to agent j it needs to wait until it 
has received all the messages from its neighboring agents other than j. Hence, the 
information required for computing a message also sets the communication protocol 
for this algorithm. Specifically, it sets the ordering of agents in the message-passing 
procedure in the algorithm, where messages can only be initiated from the leaves of the 
clique tree and upwards to the root of the tree, which is referred to as an upward pass 
through the tree. For instance, for the problem in (5) and as can be seen in Figure 2, 
Ne(2) = {1,4, 5}. Then the message to be sent from Agent 2 to Agent 1 can be written 


5^ p 'k(xj+ J2 m k i(x s 

k£d>i fcGNe(«)\{ 7 } 


( 6 ) 
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as 


m 2 i(xi,X4) = minimum {.Fi(xi, x 3 ) + F 4 (x 3 , x 4 ) + m 4 2 (x 3 ) + m 52 (x 3 )} . (7) 

x 3 

which can only be computed if Agent 2 has received the messages from agents 4 and 5. 

The message, rn t j , that every agent j receives from a neighboring agent i in fact 
summarizes all the necessary information that agent j needs from all the agents on the 
i-side of the edge Particularly this message provides the optimal value of 


E 


as a function of the variables that agents i and j share, i.e., x s ... This is shown in the 
following theorem. 

Theorem 4.2 Consider the message sent from agent i to agent j as defined in (6). 
This message can also be equivalently rewritten as 


i( x s 


= minimum 

X V i ,\S i j 



( 8 ) 


Proof See [21, Thm. 10.3] or Appendix A. ■ 

With this description of messages and at the end of an upward-pass through the clique 
tree, the agent at the root of the tree, indexed r, will have received messages from all 
its neighbors. Consequently, it will have all the necessary information to compute its 
optimal solution by solving the following optimization problem 


x = argmm 

X C r 



+ ^ rn kr (x s 

fcSNe(r) 


The next theorem proves the optimality of such a solution. 
Theorem 4.3 The equation in (9) can be rewritten as 


x* c = argmin ^ minimum {fq (x Ji ) + •••+ Fn(x Jn )} 


l 


^\c r 


(9) 


( 10 ) 


which means that x* c denotes the optimal solution for elements of x specified by C r . 
Proof See [21, Corr. 10.2, Prop. 13.1] or Appendix B ■ 

Let us now assume that the agent at the root having computed its optimal solution x* , 
sends messages m r j(x s v .) and the computed optimal solution (x* ^ to its children, 

i.e., to all agents j £ ch(r). Here 'j denotes the optimal solution computed by 
agent r. Then all these agents, similar to the agent at the root, will then have received 
messages from all their neighbors and can compute their corresponding optimal solution 
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as 


x c = argmin 


E ^ 


+ E m ki( X s 

kENe(i) 


+ 



(ii) 


Notice that since x* is optimal, the additional regularization term in (11) will not 
affect the optimality of the solution. All it does is to assure that the computed optimal 
solution by the agent is consistent with that of the root. This observation also allows us 
to rewrite (11) as 


x* = argmin 

Ci T 


E^K)+ E TOfei ( x Sik ) + (( x L)) 

k£N e(i)\r 



= argmin 


E Fk (^) + E mki ( x s ik ) + \ 

k£<j)i fcGNe(i)\r 




This means that the root does not need to compute nor send the message m r j (x s .) 
to its neighbors and it suffices to only communicate its computed optimal solution. 
The same procedure is executed downward through the tree until we reach the leaves, 
where each agent i, having received the computed optimal solution by its parent, i.e., 

/ \ par(i) 

x* , computes its optimal solution by 

V & par ( i)i J 


. J 

x c = argmin < 

[e^k: 

)+ rn ki {x Sik ) + - 

X °i \ 

[k&tpi 

/cGNe(i)\par(i) 



(13) 


where par(i) denotes the index for the parent of agent i. As a result by the end of one 
upward-downward pass through the clique tree, all agents have computed their corre¬ 
sponding optimal solutions, and hence, at this point, the algorithm can be terminated. 
Furthermore, with this way of computing the optimal solution, it is always assured that 
the solutions computed by parents and the children are consistent with one another. 
Since this is the case for all the nodes in the clique tree, it follows that we have consen¬ 
sus over the network. A summary of this distributed approach is given in Algorithm 3. 


Algorithm 3 Distributed Optimization Using Message Passing 

1: Given sparsity graph G s of an optimization problem 

2: Compute a chordal embedding of G s , its cliques and a clique tree over the cliques. 

3: Assign each term of the objective function to one and only one of the agents. 

4: Perform message passing upwards from the leaves to the root of the tree. 

5: Perform a downward pass from the root to the leaves of the tree, where each agent, having 
received information about the optimal solution of its parent, computes its optimal solution 
using (13) and communicates it to its children. 

6: By the end of the downward pass all agents have computed their optimal solutions and the 
algorithm is terminated. 


Remark 1 Notice that in case the optimal solution of (2) is unique, then we can drop 
the regularization term in (13) since the computed optimal solutions by the agents will 
be consistent due to the uniqueness of the optimal solution. 
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Figure 4. A sparsity graph and its corresponding clique tree for the problem in (14). 


So far we have provided a distributed algorithm to compute a consistent optimal 
solution for convex optimization problems in the form (2). However, this algorithm relies 
on the fact that we are able to eliminate variables and compute the optimal objective 
value as a function of the remaining ones in closed form. This capability is essential, 
particularly for computing the exchanged messages among agents and in turn limits the 
scope of problems that can be solved using this algorithm. We will later show how the 
described algorithm can be incorporated within a primal-dual interior-point method to 
solve general convex optimization problems, distributedly. 

Remark 2 The message-passing scheme presented in this section is in fact a recursive 
algorithm and it terminates within a finite number of steps or after an upward-downward 
pass. Let us define, L, the height of a tree as the maximum number of edges in a path 
from the root to a leaf. This number then tells us how many steps it will take to perform 
the upward-downward pass through the tree. As a result, the shorter the tree the fewer 
the number of steps we need to take to complete a pass through the tree and compute 
the solution. Due to this fact, and since given a tree we can choose any node to be the 
root, having computed the clique tree we can improve the convergence properties of our 
algorithm by choosing a node as the root that gives us the minimum height. 


4.2. Modifying the Generation of the Computational Graph 

As was discussed above, the clique tree of the sparsity graph of a coupled problem, 
defines the computational graph for the distributed algorithm that solves it. Given the 
sparsity graph for the problem, one of the ways for computing a chordal embedding and 
a clique tree for this graph is through the use of algorithms 1 and 2. Particularly, using 
these algorithms allows one to automate the procedure for producing a clique tree for 
any given sparsity graph, with possibly different outcomes depending on the choice of 
algorithms. However, it is important to note that sometimes manually adding edges to 
the sparsity graph or its chordal embedding can enable us to shape the clique tree to our 
benefit and produce more suitable distributed solutions. In this case, though, extra care 
must be taken. For instance, it is important to assure that the modified sparsity graph 
is still a reasonable representation of the coupling in the problem and that the generated 
tree satisfies the clique intersection property, and is in fact a clique tree, as this property 
has been essential in the proof of the theorems presented in this section. We illustrate 
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this using an example. Consider the following coupled optimization problem 


minimize fi(x 1 ,x 2 ) + //(aq, x 4 ) + f 3 (x 5 ,x e ) + f 4 (x 7 , x 8 ) 

(14a) 

subject to gi(x\, x 2 , xg) < 0 

(14b) 

g 2 (x 3 ,x 4 ,x 10 ) < 0 

(14c) 

ff3(x5,X 6 ,Xu) < 0 

(14d) 

g 4 (x 7 , x 8 ,x 12 ) < 0 

(14e) 

g5(xio,xn) < 0 

(14f) 

Xg - X 10 = 0 

(14g) 

Xu - x 42 = 0. 

(14h) 


This problem can be equivalently rewritten as 


minimize /i(xi, x 2 ) + Z Cl (aq, x 2 , xg) + f 2 (x 3 ,x 4 ) + Zc 2 (x 3 ,x 4 ,x w )+ 
h(x5,x 6 ) +1c 3 (x5,X 6 ,X h) + f 4 (x 7 ,X 8 ) + Z Ci {x 7,X 8 ,X 12 ) + 

Zc 5 (xio,x n ) +Z Cb (x 9 ,x 10 ) + Z Cj (x u,x 12 ), 

where for i = 1,..., 7, are the indicator functions for the constraints in (14b)- (14h), 
respectively, defined as 


Z c , (x) = i ° X€Ci 

I oo Otherwise 

This problem is in the same format as (3). Let us assume that we intend to produce a 
distributed algorithm for solving this problem using message-passing that would take 
full advantage of parallel computations. Without using any intuition regarding the prob¬ 
lem and/or incorporating any particular preference regarding the resulting distributed 
algorithm, we can produce the chordal sparsity graph for this problem as depicted in 
the top graph of Figure 4. A clique tree for this sparsity graph can be computed using 
algorithms 1 and 2, which is illustrated in the bottom plot of Figure 4. A distributed 
algorithm based on this computational graph does not take full advantage of parallel 
computations. In order to produce a distributed algorithm that better facilitates the use 
of parallel computations, it is possible to modify the sparsity graph of the problem as 
shown in Figure 5, the top graph, where we have added additional edges, marked with 
dashed lines, to the graph while preserving its chordal property. Notice that by doing so, 
we have virtually grouped variables xg-x\ 2 , that couple the terms in the objective func¬ 
tion and constraints, together. The corresponding clique tree for this graph is illustrated 
in Figure 5, the bottom graph. Notice that due to the special structure in the clique 
tree, within the message-passing algorithm the computation of the messages generated 
from agents 1-4 can be done independently, and hence in parallel. So using this clique 
tree as the computational graph of the algorithm enables us to fully take advantage of 
parallel computations. Next we briefly describe a primal-dual interior-point method for 
solving convex optimization problems, and then we investigate the possibility of devising 
distributed algorithms based on these methods for solving loosely coupled problems. 
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Figure 5. An alternative sparsity graph and its corresponding clique tree for the problem in (14). 


5. Primal-dual Interior-point Method 

Consider the following convex optimization problem 

minimize F(x) 

subject to gi(x) <0, i = 1,. .., m, (15) 

Ax = b, 

where F: R n —> R, g t : R n — >■ R and A € R pxri with p < n and rank(A) = p. Under the 
assumption that we have constraint qualification, e.g., that there exist a strictly feasible 
point, then x*, v* and A* constitute a primal-dual optimal solution for (15) if and only 
if they satisfy the KKT optimality conditions for this problem, given as 


m 


VF(x) + ^ AjV 5 j(x) + A t v = 0, 

i= 1 

(16a) 

A, > 0, i = 1, ..., m, 

(16b) 

gi{x) <0, i = 1,... ,m, 

(16c) 

-\gi(x) = 0, i = 1,... ,m, 

(16d) 

Ax = b. 

(16e) 


A primal-dual interior-point method computes such a solution by iteratively solving 
linearized perturbed versions of (16) where (16d) is modified as 

-Xigi(x) = 1/t, i = 1, ... ,m, 

with t > 0, [8, 35]. Particularly, for this framework, at each iteration l given primal and 
dual iterates x^ l \ A^ and v® so that gi{x W) < 0 and A® > 0 for all i = 1 ,... , m, the 
next update direction is computed by solving the linearization of 

m 

VF(x) + ^ A iVgi(x) + A t v = 0, (17a) 

i —1 


A igi{x) = 1/t, i = 


(17b) 







June 30, 2015 Optimization Methods and Software OMSPaperRev8 


14 


Taylor & Francis and I. T. Consultant 


Ax = b. 

at the current iterates, given as 

^V 2 F(x«) + \fy 2 gi{x^) j Ax+ 

m 

J2 Vgi{x^)AXi + A t Av - -~ (0 


where 


= — r 


dual’ 


i=1 


-\\ l) Vgi(x m ) T Ax - gi (x {l) )AXi = - (r£nt) . , 
i = 1, ... ,771, 


A Ax = -rg maI , 


r dL = VF(x ( ^) + AS I) V 5 i (s {0 ) + A T v^ 


(0i 


i= 1 


(^Snt) . = -A i l) gi(x®) - 1 /t, i = 1 ,... ,m, 


A 1 ) — a A 1 ) _ h 

' primal — u ' 


(17c) 


(18a) 

(18b) 

(18c) 

(19a) 

(19b) 

(19c) 


Define G ( J } = diag( 5 i(x W ),.. .,g 1 (x W )), D g (x) = [Vyi(x) ... V gm (x)] T , 



V 2 F( X W) + *i l) V 2 ffi(x (l) ) - 

i —1 i —1 


x (0 

— ±^V 9 i (x^)X/ gi (x^) T , 

gi {x^>) 


and r® = r^ al + D g (x^) T G d 1 rj. l J nt . By eliminating AA as 

AA = -G d (x (/) ) _1 (diag(A (Z) ).Dc/(x (z) )Ax - r^ } nt ) , (20) 

we can rewrite (18) as 


Spd A 


Ax 

A 0 


Ax 


r (0 

JO 

primal 


( 21 ) 


which has a lower dimension than (18), and unlike the system of equations in (18), is 
symmetric. This system of equations is sometimes referred to as the augmented system. 
It is also possible to further eliminate Ax in (21) and then solve the so-called normal 
equations for computing Av. However, this commonly destroys the inherent structure in 
the problem, and hence we abstain from performing any further elimination of variables. 
The system of equations in (21) also expresses the optimality conditions for the following 
quadratic program 


minimize ^Ax T H^Ax + (r^) r Ax 
subject to HAx = -rjj imal) 


(22) 
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Algorithm 4 Primal-dual Interior-point Method, [ 8 ] 

1: Given l = 0, p > 1, e > 0, ef eas > 0, > 0, v^°\ x^ such that g%{x^) < 0 for all 

i = and ?) (0) = YT=i -A• 0) 5 i(a; (0) ) 

2: repeat 

3: t = lim/i 

4: Given t, A^, v® and compute Ax^ + P, AA^ + P, AiA i+ P by solving (21) and (20) 

5: Compute cd /+1 ) using line search 

6 : = x + a (i+1) Aa; ( ' + P 

7: A (i+ P = A ( P + a (J+1) AA (i+ P 

8 : = yW -)- 

9: 1 = 1 + 1 

10: V {l) =Ytl~^ fW 0 ) 

11: until 111* primal 11" 5 llbduJ 2 < £ feas &nd fj® < 6 

and hence, we can compute Ax and Av also by solving ( 22 ). Having computed Ax and 
Av, AX can then be computed using (20), which then allows us to update the iterates 
along the computed directions. A layout for a primal-dual interior-point is given in 
Algorithm 4. 

Remark 3 Notice that in order for the computed directions to constitute a suitable search 
direction, the coefficient matrix in (21) needs to be nonsingular. There are different 
assumptions that guarantee such property, e.g., that N{Hff) C\N(A) = {0}, [8]. So, we 
assume that the problems we consider satisfy this property. 

There are different approaches for computing proper step sizes in the 5th step of the 

algorithm. One of such approaches ensures that g*(x^ + P) < 0 for i = 1 and 

A^ +1 ) >- 0 , by first setting 

«max = minimum jl, minimum j— A®/AA^ +1 ^ | AA f +1 ' ) < oj j , 

and conducting a backtracking line search as below 

while 3 i: gi(x ® + a^ +1 )Ax^ +1 )) > 0 do 
qT+i) = /3c^ +1 ) 

end while 

with /3 € (0,1) and a^ +1 ^ initialized as 0.99a max . Moreover, in order to ensure steady 
decrease of the primal and dual residuals, the back tracking is continued as 

while | (r^ } al , || > (1 - 7« (m) ) || (^ ima P r dil) \ do 

qT+i) = f3a 

end while 

where 7 G [0.01,0.1]. The resulting a^ +1 ) ensures that the primal and dual iterates 
remain feasible at each iteration and that the primal and dual residuals will converge 
to zero, [ 8 , 35]. 

Remark 4 The primal-dual interior-point method presented in Algorithm 4, is an infea¬ 
sible long step variant of such methods, [35]. There are other alternative implementations 
of primal-dual methods that particularly differ in their choice of search directions, namely 
short-step, predictor-corrector and Mehrotra’s predictor-corrector. The main difference 
between the distinct primal-dual directions, commonly arise due to different approaches 
for perturbing the KKT conditions, specially through the choice oft, [35]. This means 
that for the linear system of equations in (17), only the right hand side of the equations 
will be different and hence the structure of the coefficient matrix in (21) remains the 
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same for all the aforementioned variants. Consequently, all the upcoming discussions 
will be valid for other such variants. 

Next we provide a formal description of loosely coupled problems and will show how 
we can devise a distributed primal-dual interior-point method for solving these problems 
using message-passing. 


6. A Distributed Primal-dual Interior-point Method 

In this section we put forth a distributed primal-dual interior-point method for solving 
loosely coupled problems. Particularly, we first provide a formal description for loosely 
coupled problems and then give details on how to compute the primal-dual directions 
and proper step sizes, and how to decide on terminating the algorithm distributedly. 


6.1. Loosely Coupled Optimization Problems 

Consider the convex optimization problem in (1). We can provide mathematical and 
graphical descriptions of the coupling structure in this problem, as in Section 2. The 
only difference is that the coupling structure will in this case concern the triplets fi,G l 
and A* instead of single functions T 1 ,;. Similar to (3) we can reformulate (1) as 


minimize 

X 

fi( E JiX) 


+ f N {E jN x), 

(23a) 

subject to 

&{E Jt x) 

i R o, 

i = 1 ,..., N, 

(23b) 


A l E JiX = 

~-b\ 

i = 1,..., N, 

(23c) 


where in this formulation, the functions fi: —» K and G i : M.1^1 —> R mi are defined 

in the same manner as the functions Fi, rank( \_Efj ( A l ) T ... EJ n ) = p with p = 
TiiLi Pi, and the matrices A i £ RP iX l^l are defined by removing unnecessary columns 
from A 1 where pt < | J*| and rank(A*) = p for all i = 1 ,..., N. Furthermore, we assume 
that the loose coupling in the problem is such that the sparsity graph of the problem is 
such that for all cliques in the clique tree, we have \Cf\ «n and that \C t n Cj\ is small 
in comparison to the cliques sizes. 

From now on let us assume that the chordal sparsity graph of the problem in (23) has 
q cliques and that T(V), £ t ) defines its corresponding clique tree. Using the guidelines 
discussed in Section 4, we can then assign different subproblems that build up (23) to 
each node or agent in the tree. As we will show later, our proposed distributed primal- 
dual method utilizes this clique tree as its computational graph. Before we go further 
and in order to make the description of the messages and the message-passing procedure 
simpler let us group the equality constraints assigned to each agent j as 

A 7 x = b J (24) 


T A^Ej. 1 

a 7 = : 

A lm i Ej. 

L J 

b 7 = (6* 1 ,..., b im i ) 


(25a) 

(25b) 


where 
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for j = 1,... ,q, where 4>j = {?'i,. .., i mj }. We can then rewrite the problem in (23) as 

minimize fi(E Jl x) H- \- j N {Ej N x), (26a) 

subject to G\Ej.x ) -< 0, i = 1, ..., N, (26b) 

A i E Ci x = b\ i € (26c) 

where the coefficient matrices A* are obtained by permuting the columns of the matrices 
A 1 . Next we solve (23) by applying the primal-dual method in Algorithm 4 to (26) 
and will discuss how it can be done distributedly within a primal-dual framework. 
The computational burden of each iteration of a primal-dual interior-point method 
is dominated by primal-dual directions computation. We hence start by describing a 
distributed algorithm for calculating these directions using message-passing. 


6.2. Distributed Computation of Primal-dual Directions 

Computing the primal-dual directions requires solving the linear system of equations 
in (21) where for the problem in (26) 



EE 

i=l k£<j)i 


EXf E ’- 


(27) 


with 



v 2 /jx«) + A f l) y 2 G){xfl) - ]T 


3 =1 


3 = 1 


-^Ej-VG'fxW) (v<3<(x;))) T , (28) 


A = blk diag (A 1 ,..., A 9 ) E with E 
where 




r (0 


E T ... ,?’'?>(0) 


m k 

r<M = J2 {VA(X«) + £>‘- (,) VG}(*<'>) 


+ 


ke<pi 


3 = 1 


DG‘ 


' ( x jfe) diag (G fc (*W)) + (A*)V>«, 


with 


r c fe ’2 = -diag(A fe A0)G fe (^))-ll 


t 


and primal = ^primal’ ^primal) with 


9.(0 


r *-(0 — AV® - h* 

' primal x Ci u • 


(29) 


The key for devising a distributed algorithm based on a primal-dual interior-point 
method, is to exploit the structure in this linear system of equations that also expresses 
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the optimality conditions for the following quadratic program 


minimize -Ax T ( ^ Ej k H^’^Ej k J Ax + Eg i Ax (30a) 

i= 1 1 \kG<l>i J 

subject to A- l Ec t (Ax + x®) = b*, i = l,...,q. (30b) 

which can be rewritten as 

i - y 

minimize ^^ Ec t Ax + {r l ^) T Ec t Ax (31a) 

i= 1 

subject to A. l Ec t (Ax + a;®) = b*, « = (31b) 


where = Ylke(j>i(F k ) T E\ with E\ = Ej k E^,. In order to assure that the 

property in Remark 3 also holds for the problem in (31), we need to make assumptions 
regarding the subproblems assigned to each agent, which is described in the following 
lemma. 

Lemma 6.1 The condition in Remark 3 holds for the problem in (31), if AffH^J ) n 
Af (A 1 ) = {0} for all subproblems i G N 9 . 

Proof The condition in Remark 3 is equivalent to 



n Af 


u 


A l E Cl 
A q E c 


{ 0 }. 


(32) 


Since Ejjfti’C Eq. € §+ for all* = 1,..., N, this condition can be equivalently rewritten 
as 



(33) 


By arranging the terms in (33) and using associative property of the intersection oper¬ 
ator, we can equivalently reformulate it as 


H K(^H;f^)nAT(A^ Ci ) 


{ 0 }. 


(34) 


Notice that the Ec t s are constructed such that they have full row rank. Now let 
V(Hji'>) fl A (A ! ) = {0} for all i = 1 and assume that there exists x / 0 

such that 


X € n [^(EcPpd E a) nAf(AE Ci ) . 

2—1 

This then implies that for any x c . = Ec t x it must hold that x c , € Af fl 

N (A 1 ) for alH = 1,..., q, or equivalently x c _ £ A f ^ nA / (A*) for all* = 1,, q, 
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since Ec t s have full row rank. Under the assumption that x 7 ^ 0, then for some i € N g , 
x c . / 0. Therefore, xc t € N (Rid ^ D M (A*) and x c . 7 ^ 0 for some i. This is in 

contradiction to the assumption that A/" (h^ ^ n AT (A*) = { 0 } for all i = 1 
This completes the proof. ■ 

We can rewrite (31) as the following unconstrained optimization problem 

minimize -^Axftl^f 1 Ax Cj + (r l, ^) T Ax c . +Xj-.{/S.x c .) (35) 

i= I ^-~v-' 

A(a* 04 ) 


where 7) is the polyhedral set defined by the ith equality constraint in (31) and I 7 - is 
its corresponding indicator function. The problem in (35) is in the same form as (3). 
Notice that the coupling structure in this problem remains the same during the primal- 
dual iterations. Furthermore, the coupling structure for this problem is such that we can 
solve it by performing message-passing over the clique tree for the sparsity graph of (23). 
Considering the subproblem assignments discussed in Section 6.1, at each iteration of 
the primal-dual method, each agent will have the necessary information to form their 
corresponding quadratic subproblems and take part in the message passing framework. 

Let us now focus on how the exchanged messages can be computed and what infor¬ 
mation needs to be communicated within the message passing procedure. Firstly, notice 
that each F t describes an equality constrained quadratic program. Consequently, com¬ 
puting the exchanged messages for solving (35), requires us to compute the optimal 
objective value of equality constrained quadratic programs parametrically as a function 
of certain variables. We next put forth guidelines on how this can be done efficiently. 
Consider the following quadratic program 


minimize 


z 

T 

Qzz 

Qzy 


z 

1 

Qz 

T 

Z 

y_ 


[Ql 

Qyy _ 


y_ 

+ 

Qy. 


y_ 


+ c 


subject to A z z + A y y = b 


(36) 


A z A y ] e R pxn with n = n z +n y , rank([A 2 A y ]) = rank(A 2 ) = 

zy I) nW( [A z A y ]) = {0}. Without loss of generality assume that 

we intend to solve this optimization problem parametrically as a function of y. This 
means that we want to solve the following optimization problem 


where z € 
p, and that vV( 


>ye 

Qzz Qz 

iQzy Q yy. 


minimize \z T Q zz z + z T (Q zy y + q z ) + \y T Q yy y + y T q y + c 

z Z Z 

subject to A z z = b — A y y (37) 


The optimality conditions for this problem are given as 


Qzz 

AT] 


2 : 


-Qz 


Qzy 

u 2 

0 


V 


b 


. Ay _ 


o / " Hy) 


(38) 


Notice that for the problem in (36) O is nonsingular, which is shown in the following 
lemma. 
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Lemma 6.2 Consider the problem in (36), and assume that rank( \A Z Ay \) = 

rank(H z ) = p and Af( Sjp Sa v ) fl Af( [A z Ay\ ) = {0}. Then O is nonsingular. 

^°c zv ^cyy 


\yczy ^yy J 

Proof Firstly notice that under the assumption in the lemma, the optimality condition 
for (36), given as 

Qzz Qzy A z z q z 

Qzy Qyy Ay y = —q y , (39) 


A z Ay 0 J L^. 


has a unique solution and its coefficient matrix is nonsingular. This means that 


/ Qzz 

rank Q T zy 


or equivalently 


n Af(A z ) = {0}. 


Since 


Qzz Qzy 
Qzy Q yy 


is positive semidefinite, we can rewrite it as 


Qzz Qzy 
Qzy Q yy. 


U U 
V V 


where assuming rank 


Qzz Qzy 

Qzy Q yy. 


= r < n, 


€ R nxr and has full column rank. 


Then the condition in (41) can be rewritten as 


U T ^j n Af(A z ) = Af (U T ) n Af(A z ) = {0}. (43) 

Furthermore, since C(U T ) and Af(U) are orthogonal complements, we have Af (UU T ) = 
Af ( U T ), which enables us to rewrite (43) as 

Af (UU T ) n Af(A z ) = Af (Q zz ) n Af(A z ) = {0} (44) 

which is equivalent to O being nonsingular. This completes the proof. ■ 

By Lemma 6.2, we can then solve (38) as 


_ = O 
v 


-Qz] r q 


Having computed the optimal solution parametrically as a function of y, we can now 
compute the optimal objective value as a convex quadratic function of y. p*(y), by 
simply substituting z from (45) in the objective function of (36). 
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We can now discuss the computation and content of the messages. Firstly notice that 
each of the constraints in (31b) can be written as 


A i A *c 4 \ Sj 


par(i) 


+ AoAx, 


i par(i) 


= b' - A*x«. 


For now assume that [Aj A|] and A'j are full row rank for all i £ N g . Also recall that 
for the problem in (35) the message to be sent from agent i to its parent par(i) is given 
as 


m lJ (Ax' s . ;par(i) ) = minimum l F<( Ax c .) + ^ m ki ( Ax s .J . (46) 

x c i \s i par(i) [ fcech(i) J 

Then, for this problem, all the exchanged messages define quadratic functions as de¬ 
scribed above, which is shown in the following theorem. 

Theorem 6.3 Consider the message description given in (46). For the problem in (35), 
all the exchanged messages are quadratic functions. 

Proof We prove this using induction, where we start with the agents at the leaves. For 
every agent i £ leaves (T), the computed message to be sent to the corresponding parent 
can be computed by solving 

minimize Ax c . + (r l ’^) T Ax c . (47a) 

subject to A*(Ax c . + = b*, (47b) 

parametrically as a function of Ax„ . Under the assumption stated in Lemma 6.1, 

*par(i)i 

■mh;-?) n A7(A*) = {0}. As a result the assumption in Lemma 6.2 holds for (47) and 
hence we can use the procedure discussed above to solve the problem parametrically. 
Consequently the messages sent from the leaves are quadratic functions. Now consider 
an agent i in the middle of the tree and assume that all the messages received by this 
agent are quadratic functions of the form 

m ki {Ax Sik ) = Ax T Sik Q ki Ax Sik + Ax s . k + c ki . 

Then this agent can compute the message to be sent to its parent, by solving 


minimize -Ax^. I Ej k Q ki E ik I Ax c 

\ fcech(i) J 

+ | r*4b + Ef k Qki J Ax c» + Cj 
fcgch(i) J 


subject to A*(Ax c +xg^) = b* 


(48a) 

(48b) 

(48c) 


with Ei k = Es ik Ej Jt , parametrically as a function of Ax s . Notice that the assump¬ 
tion in Lemma 6.1 implies that M + Ylk£ch(i) Ej k Q kl E lk \ n A7(A*) = {0}. This 
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means that the assumption in Lemma 6.2 would also be satisfied and hence the com¬ 
puted message to the parent would be a quadratic function. This completes the proof. 

■ 

Notice that sending the message m^j to agent j requires agent i to send the data 
matrices that define the quadratic function. Following the steps of the message-passing 
method discussed in Section 4, we can now compute the primal variables direction, Ax, 
distributedly. 

It now remains to discuss how to compute the dual variables directions, Av k for 
k = 1,..., q, and AX k for k = 1,..., N. We will next show that in fact it is possible to 
compute the optimal dual variables direction during the downward pass of the message¬ 
passing algorithm. Firstly recall that during the upward pass each agent i, except the 
agent at the root, having received all the messages from its children forms (48) and 
solves it parameterically as a function of Ax„ by first computing 

* par(i) 


r A ^\ s , par(i) ' 



Ax„ + 

h\ 

<1 



‘-’i par (i) 

X. 


as described above, and then communicating the parametric optimal objective value as 
the message to the parent. Notice that (49) defines the optimality conditions for (48) 
given Ax s . or equivalently the optimality conditions of (13) without the regulariza¬ 
tion term, which we are allowed to neglect since by the assumption in Lemma 6.1 the 
optimal solution of (31) is unique, see Remark 1. As a result this agent having received 
Ax* from its parent can use (49) to compute its optimal primal solution. As we will 

show later the computed dual variables in (49) during this process will also be optimal 
for (31). The agent at the root can also compute its optimal dual variables in a similar 
manner. Particularly , this agent having received all the messages from its children can 
also form the problem in (48). Notice that since par(?’) = 0 then S paiI ( r ) r = 0. As a 
result (49) for this agent becomes 


r i 


X 

i 

* 

t. 

< 


X 


which is the optimality condition for (48). Consequently, the dual variables computed 
by this agent when calculating its optimal primal variables will in fact be optimal for 
the problem in (31). The next theorem shows that the computed primal directions Ax* 
and dual directions Av* using this approach then satisfy the optimality conditions for 
the complete problem in (30) and hence constitute a valid update direction, i.e., we can 
choose Ax^ +1 ) = Ax* and Au^ +1 ) = Av*. 

Theorem 6.4 If each agent i € computes its corresponding optimal primal and 
dual variables directions, Ax* , (Av 1 )*, using the procedure discussed above, then the 
calculated directions by all agents constitute an optimal primal-dual solution for the 
problem in (30). 

Proof We prove this theorem by establishing the connection between the message¬ 
passing procedure and row and column manipulations on the KKT optimality conditions 
of (31). So let us start from the leaves of the tree. From the point of view of agents 
i € leaves(T) = {*i,... ,i t } we can rewrite (31) as 
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iGleaves(T) 
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subject to A y y l + A yu u = b y , i G leaves(T) 
Azu'tt + -A 2;-^ = 62:, 


where y l = Ax OAS . psr(j) , u = Ax St with S T = U i6leaves ( T )S ipar(i ), z = Az SjASt with 

*-V — hJjgi eaves (j') Vp ar (j)* 5 Hyu Ryu^i par(i)) ~ ^ip a ,r(i)^'uu^ipax(i)i Qu ~ par(i)^ u 

and = Ayy E, par ^) with -Ej par (j) = par(i) E'J.. In other words, in this formulation 
the variables y, u and z denote the variables present only in the subproblems assigned 
to the leaves, the variables that appear in both these subproblems and subproblems 
assigned to all the other agents and the variables that are not present in the subproblems 
assigned to the leaves, respectively. Furthermore, each of the terms in the sum in (51a) 
and the constraints in (51b) denote the cost functions and equality constraints that 
are assigned to the ith agent. The KKT optimality conditions for this problem can be 
written as 
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which by conducting column and row permutations can be rewritten as 
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By Lemma 6.2, the blocks 


4, (4T 

A l j 0 


are all nonsingular and hence we can define 
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. If we pre-multiply (53) by Q i, we can rewrite it as 
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i £ leaves(T) 

(55a) 

(55b) 


R«u = E A T par«) (Ku + (HifRiyHi + (H{) T Ri u + (R} \ u ) t H[ ) E ipar(i) (56a) 

iG leaves (T) 

q« = E E Lrd) (4 + (^i) T 4 + (4u) T M + (iC r K w i>\) (56b) 

iG leaves (T) 


Notice that considering the definitions in (51) and (55), the matrices H\,H 2 ,h\ and 
h l 2 in (55a) and (49) are the same, and hence the terms (H[) T R l yy H\ + (H\) T R l yu + 
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(R l yu ) T H\+R l uu an< 4 Qu+(H[) T qy+(Ry U ) T h\ + (H{) T Ry y h\ in (56) are the data matrices 
that define the quadratic and linear terms of the message sent from each of the leaves to 
its parent, and the additional terms Ej vax u\ and E ipai ^ merely assure that the messages 
are communicated to the corresponding parents. By performing the pre-multiplication 
above we have in fact pruned the leaves of the tree, and have eliminated the variables 
that are only present in their respective subproblems. We can now conduct the same 
procedure outlined in (52)—(56), that is repartitioning of variables and performing row 
and column permutations, for the parents that all their children have been pruned, 
using (55b). We continue this approach until we have pruned all the nodes in the tree 
except for the root, as 



Ci\S ipar(i) 

Av l 



Ax s . 


par(i) 



ieN,\ {r} 


(h^° + Efcech(r) Ej k Q kr E rk ) (AT 
A r 0 


A x Cr 


_ f r r,6) + ^ fcgch(r) Ef k q kr \ 

Av r 


r o(0 

L J 


primal 


(57a) 

(57b) 


where what remains to solve is the optimality conditions for the problem of the agent 
at the root, given in (48), in (57b). Notice that this procedure is in fact the same as the 
upward pass in Algorithm 3. At this point we can solve (57b) and back substitute the 
solution in the equations in (57a) with the reverse ordering of the upward pass, which 
corresponds to the downward pass through the clique tree in Algorithm 3. With this we 
have shown the equivalence between applying the message-passing algorithm to (31) and 
solving the KKT conditions of this problem by performing row/column manipulations, 
and hence have completed the proof. 


Finally, during the downward pass and by (20), each agent having computed its primal 
variables direction A,t* , can compute the dual variables directions corresponding to its 
inequality constraints by 

AA fc ’(* +1 ) = - diag(G fc (x«))- 1 {dmg{\ k ^)DG\xfl)Ax* Jk - r c fe ’2) , (58) 


for all k £ 4>i. 

Remark 5 Notice that the proposed message-passing algorithm for computing the primal- 
dual directions relies on the assumption thatAf + X^fcech(i) EjkQkiEik) = 

{0} for all i £ N q , and the conditions in Lemma 6.1 describe a sufficient condition for 
this assumption to hold. However, the aforementioned assumption can still hold even if 
the conditions in Lemma 6.1 are not satisfied, in which case the proposed algorithm can 
still be used. 

Remark 6 It is also possible to use a feasible primal interior-point method for solving 
the problem in (23). For a primal interior-point method, unlike a primal-dual one, at 
first the KKT optimality conditions are equivalently modified by eliminating the dual 
variables corresponding to the inequality constraints, using the perturbed complementar¬ 
ity conditions as in (17b). Then the resulting nonlinear system of equations is solved 
using the Newton method, iteratively, [8, 11.3.4]. At each iteration of a feasible primal 
interior-point method, we only need to update the primal variables, where their corre¬ 
sponding update direction is computed by solving a linear system of equations similar 
to (21). In fact, applying a primal interior-point method to the problem in (23) would 
then, at each iteration, require solving a linear system of equations that will have the 
same structure as the one we solve in a primal-dual interior-point method. Hence, we 
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can use the same message-passing procedure discussed above to compute the primal vari¬ 
ables directions within a primal framework. Primal interior-point methods are known to 
perform worse than their primal-dual counterparts. However, since we do not need to 
compute dual variables directions at each iteration, we can relax the rank condition on 
the equality constraints. This is because this condition has solely been used for the proof 
of Theorem 6.2 h and only concerns the computations of the dual variables. 

The distributed algorithm for computing the primal-dual directions in this section 
relies on the seemingly restrictive rank conditions that ... EJ n (A n ) t ^ , 

[A| A?,] and A'j are all full row rank for all i £ N,. Next we show that these con¬ 
ditions do not affect the generality of the algorithm and in fact they can be imposed by 
conducting a preprocessing of equality constraints. 

6.2.1. Preprocessing of the Equality Constraints 

We can impose the necessary rank conditions by conducting a preprocessing on the 
equality constraints, prior to application of the primal-dual method. This preprocessing 
can be conducted distributedly over the same tree used for computing the search direc¬ 
tions. Let us assume that the constraints assigned to each of the agents at the leaves, 
i.e., all i € leaves(T), are given as 


A\x 


c«\s ipar(i) 


+ A9X, 


i par(i) 


= b' 


(59) 


and that [Aj A?>] € R PiXTli anc j that rank(Aj) = qi < p.[. Every such agent can then 
compute a rank revealing QR factorization for A^ as 



TT 

0 ’ 


(60) 


where Q l € M PiXpi is an orthonormal matrix and R' € CAS*pared with rank(R*) = q % . 
As a result the constraints in (59) can be equivalently rewritten as 


Ai a|- 

. 0 A 3 


x r 



(61) 


where 


A) A* 

. 0 A| 


:= Q‘ [A- A!,] 
:= <3‘b‘. 


Once each agent at the leaves has computed the reformulation of its equality constraints, 
it will then remove the equality constraints defined by the second row equations in (61) 
from its equality constraints, and communicates them to its parent. At this point the 
equality constraints assigned to each agent i at the leaves, becomes 

[Ai A*] * Ci = b\ 

where [A( A(>] and A( are both full row rank. Then every parent that has received all 
the equality constraints from its children, appends these constraints to its own set of 
equality constraints, and performs the same procedure as was conducted by the agents 
at the leaves. This process is then continued until we reach the root of the tree. The 
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agent at the root will then conduct the same reformulation of its corresponding equality 
constraints and removes the unnecessary trivial equality constraints. Notice that at this 
point the equality constraints for all agents satisfy the necessary rank conditions, and 
hence the preprocessing is accomplished after an upward pass through the tree. 

Remark 1 In a similar manner as in the proof of Theorem 6.4, it can be shown that 
the preprocessing procedure presented in this section (except for the removal of trivial 
constraints by the agent at the root) can be viewed as conducting column permutations 
on the coefficient matrix of the equality constraints and pre-multiplying it by a non¬ 
singular matrix. Consequently, this preprocessing of the equality constraints, does not 
change the feasible set. 

In the proof of Theorem 6.4 we described the equivalence between applying the 
message-passing scheme in Algorithm 3 to the problem in (31) and solving its cor¬ 
responding KKT system through column and row manipulations. Inspired by this dis¬ 
cussion, and before we describe distributed implementations of other components of the 
primal-dual interior-point method in Algorithm 4, we explore how the message-passing 
algorithm in 3 can be construed as a multi-frontal factorization technique. 


6.3. Relations to Multi-frontal Factorization Techniques 

Let us compactly rewrite the KKT system in (52) as 


Then (53) can be written as 
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z 

Av 


= r. 


Pi if if Pi 
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Av 


= P\ r, 


(62) 


(63) 


where P\ is a permutation matrix. In the proof of Theorem 6.4 we showed that by pre¬ 
multiplying (53) by Q i, we can block upper-triangulate the KKT system as in (55), i.e., 
Q\PiHP( is block upper-triangular. This was in fact equivalent to the first stage of the 
upward pass in Algorithm 3, which corresponds to sending messages from the agents at 
the leaves of the tree to their parents. If we now in this stage multiply Q\P\HP( from 
the right by Qf, it is straightforward to verify that we arrive at 


ChPiHPfQj 
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(64) 


and as a result we have block-diagonalized H, where we have t+1 blocks on the diagonal. 
Notice that the first t blocks on the diagonal are the matrices O' for i = i\,,.. ,it that 
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are known to each of the agents at the leaves. Furthermore, the information needed to 
form Qi is distributedly known by the agents at the leaves, since we can write Q\ as 
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This then means that not only it is possible to block-triangulate H in the first stage 
of the upward pass as in (64), but also the information that is needed to do so is 
distributed among the involved agents and is based on their local information. It is 
possible to continue this procedure by block-triangulating the last diagonal block in 
right hand side of (64) as below 
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( 66 ) 


where similar to the previous stage P -2 is a permutation matrix and Q 2 is computed 
using a similar approach as Q \. Here the newly generated small diagonal blocks are the 
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matrices O* with i being indices of the parents of the leaves that have received all mes¬ 
sages from their children. This step of block-diagonalization can be accomplished after 
the second step of the upward pass in the clique tree. We can continue this procedure 
upwards through the tree until we have q blocks on the diagonal at which point we have 
arrived at the root of the tree. So having finished the upward pass we have computed 


Ql+iPl+i x 
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(67) 


with q diagonal elements that are the matrices O* for i € N 9 . Notice that this means that 
by the end of an upward pass through the tree we have in fact computed an indefinite 
block LDL t factorization of H where both the computation and storage of the factors 
are done distributedly over the clique tree. 

Remark 7 As was shown in this section, the message-passing scheme can he viewed as 
a distributed multi-frontal indefinite block LDL T factorization technique that relies on 
fixed pivoting. This reliance is in conformance with and dictated by the structure in the 
problem which can in turn make the algorithm vulnerable to numerical problems that can 
arise, e.g., due ill-posed subproblems. Such issues can be addressed using regularization 
and/or dynamic pivoting strategies. Here, however, we abstain from discussing such 
approaches as the use of them in a distributed setting is beyond the scope of this paper. 

So far we have described a distributed algorithm for computing the primal-dual di¬ 
rections. In the next section we put forth a distributed framework for computing proper 
step sizes for updating the iterates, and we will also propose a distributed method for 
checking the termination condition at every iteration. 


6.4. Distributed Step Size Computation and Termination 

In this section, we first propose a distributed scheme for computing proper step sizes 
that relies on the approach described in Section 5. This scheme utilizes, the clique tree 
used for calculating the primal-dual directions, for computing the step size. Similar to 
the message-passing procedure discussed in the previous section, in this scheme we also 
start the computations from the leaves of the tree. The proposed scheme comprises of 
two stages. During the first stage a step size bound is computed that assures primal 
and dual feasibility, with respect to the inequality constraints, of the iterates, and then 
during the second stage a back tracking line search is conducted for computing the step 
size which also assures persistent decrease of primal and dual residual norms. Within 
the first stage, let each leaf of the tree, i, firstly compute its bound dfU+i) performing 
a local line search. This means that initially every agent at the leaves computes 
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and then performs a local line search based on its corresponding inequality constraints, 
i.e., G k for k G 4>i, to compute 

while 3 j and k : G k (x^ + cd’^ +1 ) Ax^ +1 )) > 0 do 
«*’(*+1) = /3a®’( /+1 ) 

end while 

with /3 G (0,1) and initialized as 0.990^^. These agents will also compute the 

quantities 
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( 68 ) 
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( 69 ) 


These will be used in the second stage of the step size computation. Once all leaves 
have computed their corresponding Pnorm and dnorm, they send these quan¬ 

tities to their parents where they will also conduct a similar line search and sim¬ 
ilar computations as the ones performed in the leaves. Specifically, let agent p be 
a parent to some leaves. Then the only differences between the computations con¬ 
ducted by this agent and the leaves are in that the line search above is initialized 
as minimum {minimum feech (p) . 0 . 99 « n ia Y } and that 
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(70) 


Using this procedure each agent communicates its computed a l ^ l+1 \ pnorm and dn ® m 
upwards through the tree to the root. Once the root has received all the computed step 
size bounds from its children/neighbors, it can then compute its local step size bound 
in the same manner. However, the computed bound at the root, d r, 0 +1 ), would then 
constitute a bound on the step size for updating the iterates which ensures primal and 
dual feasibility for the whole problem. Furthermore the computed dkonn and d n orm at 
the root will then constitute the norm of the primal and dual residuals for the whole 
problem computed at the iterates at iteration l. This finishes the first stage of the step 
size computation. The second stage, is then started by communicating this step size 
bound downwards through the tree until it reaches the leaves. At which point each agent 
at the leaves computes the quantities Pnorm^ and pnorm^ as above with the updated local 
iterates using the step size a r ^ l+1 \ These quantities are then communicated upwards 
through the tree to the root where each agent having received these quantities from all 
its children computes its corresponding Pnorm^ and Pnorm^ as in (70) using the updated 
local iterates. Once the root have received all information from its children it can also 
compute its corresponding quantities which correspond to the primal and dual residuals 
for the whole problem computed at the updated iterates using the step size a r,( ' l+1 ' > . 
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Then in case 
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we set a r ’^ +1 ^ = /3a r ’^~ , ~ 1 ) and the same procedure is repeated. However if the condition 
above is not satisfied, the step size computation is completed and we can choose = 

a r, ( l+ 1), which is then communicated downwards through the tree until it reaches the 
leaves. At this point all agents have all the necessary information to update their local 
iterates. Notice that since all the agents use the same step size, the updated local iterates 
would still be consistent with respect to one another. 

Having updated the iterates, it is now time to decide on whether to terminate the 
primal-dual iterations. In order to make this decision distributedly, we can use a similar 
approach as for the step size computation. Particularly, similar to the approach above, 
the computations are initiated from the leaves where each leaf i computes the norm of 
its local surrogate duality gap as 

= Y -(A fc ’ (z+1) ) T G fc (x^ +1) ) / 72 ) 

k£<pi 

The leaves then communicate these computed quantities to their corresponding parents, 
which will then perform the following computations 


1) 


Y -(X k ’ {l+1) ) T G k (xY 1) ) + Y 


ke&i 
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(73) 


This approach is continued upwards through the tree until we reach the root. The com¬ 
puted quantity by the root, i.e., will then be equal to the surrogate duality gap 

for the whole problem. This quantity together with Pnoira ^, ^norm^, which was computed 
during the step size computation, are used by the agent at the root to decide whether 
to terminate the primal-dual iterations. In case the decision is to not to terminate the 
iterations, then the computed surrogate duality gap is propagated downwards through 
the tree until it reaches the leaves of the tree, which then enables each of the agents to 
compute the perturbation parameter, t, and form their respective subproblems for the 
next primal-dual iteration. However, in case the decision is to terminate, then only the 
decision will then be propagated downwards through the tree. 

By now we have put forth a distributed primal-dual interior-point method for solving 
loosely coupled problems. In the next section we summarize the proposed algorithm and 
discuss its computational properties. 


6.5. Summary of the Algorithm and Its Computational Properties 

Let us reconsider the problem in (23). As was mentioned before, this problem can be seen 
as a combination of N subproblems each of which is expressed by the objective function 
fi and equality and inequality constraints defined by A 1 , bi and G l , respectively. Given 
such a problem and its corresponding sparsity graph G s , in order to set up the proposed 
algorithm, we first need to compute a chordal embedding for the sparsity graph. Having 
done so, we compute the set of cliques C g = {C\, C 2 , ■ ■ ■, C q } for this chordal embedding 
and a clique tree over this set of cliques. With the clique tree defined, we have the 
computational graph for our algorithm, and we can assign each of the subproblems to a 
computational agent, using the guidelines discussed in Section 4. At this point we can 
perform the preprocessing procedure presented in Section 6.2.1, if necessary, and apply 
our proposed distributed algorithm as summarized below to the reformulated problem. 
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Given l = 0, /jl > 1, e > 0, ef eas > 0, A^ 0 ) > 0, v^°\ such that -< 0 for all 

i = l,...,N, f)^ = 'FiLi -(A*’(°)) t G*(x<W) and t = (jj, J^iLi m i ) /v^ 

repeat 

for i = 1,..., q do 

Given t, x^}, and for k G fy, agent i forms its 
quadratic subproblems based on its assigned objective 
functions and constraints as described in (27)—(35). 

end for 

Perform message-passing upwards through the clique tree 
Perform a downward pass through the clique tree where each agent i 
having received optimal solutions Ax* , 

& ipar(i) 

computes Axj/ +1 ) and Au*’^ +1 ^ using (49); 

and then computes AA fc, ^ +1 ) for all k G 4>i using (58). 

Compute a proper step size, a^ l+1 \ by performing 
upward-downward passes through the clique tree as discussed 
in Section 6.4. 
for i = 1,..., q do 
Agent i updates, 

2,0+1) _ x (l) _|_ Q,(i+i) Ax^"* -1 ); 

X k,(i+1) = A fc,(0 + a ( ,+ i) AA fc ’( ,+ i) for all k G 

1 ) = yh(!) _)_ 

end for 

Perform upward-downward pass through the clique tree to 
decide whether to terminate the algorithm and/or to update 

the perturbation parameter t = ^TiLi 171 ^ /v^ l+1 ^- 
1 = 1 + 1 . 

until the algorithm is terminated 

As can be seen from the summary of the algorithm above, at each iteration of the primal- 
dual method we need to perform several upward-downward passes through the clique 
tree, one for computing the primal variables direction, one to make decision regarding 
terminating the algorithm and/or for updating the perturbation parameter and several 
for computing a proper step size. Notice that among the required upward-downward 
passes, the one conducted for computing the primal and dual variables directions is by far 
the most computationally demanding one. This is because at every run of this upward- 
downward pass each agent needs to form (49), which requires inverting its corresponding 
O' . Since primal-dual interior point methods commonly converge to a solution within 
30-50 iterations, the computational burden for each agent is dominated by at most 50 
factorizations that it has to compute within the run of the primal-dual algorithm. Also 
notice that the required number of upward-downward passes for computing the step 
size, depend on the back-tracking parameters a and (3 and it is possible to reduce this 
number by tuning these parameters carefully. Furthermore, for the final iterations of 
the primal-dual method, also known as quadratic convergence phase, there would be 
no need for any back-tracking operation. Let us assume that the height of the tree is 
equal to L and that the total number of upward-downward passes that is required to 
accomplish the second stage of step size computations is equal to B. Then assuming 
that the primal-dual method converges within 50 iterations, the total number of upward- 
downward passes would mount to B + 3 x 50 and hence the algorithm converges after 
2xfx(B + 3x 50) steps of message passing. Also within the run of this distributed 
algorithm each agent would then need to compute a factorization of a small matrix at 
most 50 times and communicate with its neighbors 2 x (B + 3 x 50) times. 
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Remark 8 As was discussed in Remark 4 the primal-dual method used in this paper is 
an infeasible long step primal-dual method, which requires solving (21) or (22) only once 
at each iteration. However in predictor-corrector variants of primal-dual methods, com¬ 
puting the search directions requires solving (21) or (22) twice with different r^ terms. 
This means that distributed algorithms based on message-passing that rely on predictor- 
corrector primal-dual methods would need two upward-downward passes to compute the 
search directions. However, despite the change of A l \ the matrices O* formed by each 
agent during the upward-pass of the message-passing remains the same for both of the 
mentioned upward-downward passes. Consequently, each agent by caching the factor¬ 
ization of O* at each iteration of the primal-dual method can significantly reduce the 
computational burden of the second upward-downward pass. Notice that considering the 
discussion in Section 6.3, this approach is equivalent to the caching of the factorization 
of the coefficient matrix of (21). 

Remark 9 As can be seen from the summary of the proposed algorithm, we need to 
initialize the algorithm with a feasible starting point, i.e., such that G z (x^) -< 0 

for all i = 1,..., N and A( 0 ) > 0. Constructing a A^°) > 0 can be done independently by 
each agents. However, producing a suitable x ^ is nontrivial. In order to generate such 
a starting point we suggest making use of a Phase I method based on minimizing sum 
of infeasibilties, [8], which entails solving the following optimization problem 

N 

minimize 1 T s 1 (74a) 

’ i =1 

subject to G l {Ej i x) ■< s l , i = 1,..., N, (74b) 

s i y —e, i = l,...,N, (74c) 

where e is a very small positive scalar and S = (s 1 ,..., s N ) with s l £ M mi . In case the 
optimal objective value of the problem is equal to —e x md, then the solution x* 

of the problem constitutes a proper starting point for our proposed distributed algorithm. 
Notice that the problem in (74) has the same coupling structure as in (23), and hence 
we can use our proposed distributed algorithm, based on the same clique tree or compu¬ 
tational graph, for computing a feasible starting point. However, for the problem in (74), 
we can easily construct a proper starting point for the algorithm. For instance, x ^ = 0 
and sf 0) = max^G^Ej.x^), — e) constitute a feasible starting point, which each agent 
can compute independently from others. 

Next we illustrate the performance of the algorithm using a numerical experiment. 


7. Numerical Experiments 

In this section, we investigate the performance of the algorithm using an example. To 
this end we consider a flow problem over a tree where having received input flows 
from the leaves of the tree, i.e., Ui for all i £ leaves(T), the collective of agents are 
to collaboratively provide an output flow from the root of the tree that is as close as 
possible to a given reference, O re f. We assume that each agent i in the tree produces an 
output flow fi that depends on the flow it receives from its children and the use of its 
buffer which is described using its buffer flow di, where a positive di suggests borrowing 
from the buffer and a negative di suggests directing flow into the buffer. Furthermore, 
there exists a cost associated with the use of the buffer and a toll for using each edge 
for providing flow to respective parents. The setup considered in this section is depicted 
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di 



in Figure 6, that is based on a tree with 7 agents. We intend to provide the requested 
output flow from the tree while accepting the input flow to the leaves, with minimum 
collective cost for the agents in the network. This problem can be formulated as 


H ' 1 1 

minimize ^ - (jmxj + PiX q+i ) + - [a X (x q+1 - O re f) 2 + p\x\) 
i =2 w 

Ui ~h Xi — Xq+i 
\Xi\ < Ci 

SfcGch(i) x q+k + x i = X q+ i 
I Xi\ < Ci 


subject to 


i £ leaves (T) 

i £ N q \ leaves (T), 


(75a) 

(75b) 

(75c) 


where x = (di ,..., d q , /i,..., f q ) with q = 7, the parameters Hi, pi and c t denote the 
buffer use cost, the toll on outgoing edge and the buffer use capacity for each agent i, 
respectively, and a denotes the cost incurred on the agent at the root for providing a 
flow that deviates from the requested output flow. Here we assume that the values of the 
parameters , c^, a are private information for each agent, which makes it impossible 
to form the centralized problem. Let us now rearrange the terms in the cost function 
and rewrite the problem as 


minimize 

X 



1=2 



, Pi 


2 

q+i 


+ E 

fcGch(i) 




a x (x q+ i - Oj 


ef) 2 + plX\ + 


E 

fcGch(l) 


Pk 2 

r, x q+k 


(76a) 
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C 4 — {4,11} Cq = {6,13} C 7 = {7,14} C 3 = {3,10} 

Figure 7. The corresponding clique tree for the sparsity graph of the flow problem 

Ui f — %q-\-i 

subject to \xi\ < Ci > i G leaves(T) 

■l'q -ii >0 J 

SfcGch(i) x q+k + x i = x q+i j 

\xi\ < Ci > i 6 N g \ leaves(T) 

x q+i >0 J 

This problem can now be seen as a combination of q = 7 coupled subproblems where 
each of the subproblems is defined by each of the q terms in the cost function and 
each of the q constraint sets. The clique tree for the sparsity graph of this problem is 
illustrated in Figure 7 and has the same structure as the flow network. As a result, 
this problem can be solved distributedly using the proposed message-passing algorithm 
while respecting the privacy of all agents. We have solved 50 instances of the problem 
in (76) where the parameters are chosen randomly with uniform distribution such that 
Ui G (0,20), /j,; € (0,10), pi G (0,5), Cj € (0,15), O re f G (0,20) and a G (0,50). The 
parameters describing the stopping criteria for all instances are chosen to be the same 
and are given as ef eas = 10~ 8 and e = 10 -10 , and for all cases the initial iterates are 
chosen to be = t/ 0 -* = 1 and = (ci/2,..., c q /2, 1,..., 1). Also the parameters 
used for computing the step sizes are chosen to be a = 0.05 and (3 = 0.5. In the worst 
case the primal-dual algorithm converged after 14 iterations. The convergence behavior 
of the algorithm for this instance of the problem is studied by monitoring the primal and 
dual residuals, the surrogate duality gap and the distance to the optimal solution, as 
depicted in Figure 8. As expected the behavior resembles that of a primal-dual method. 
The optimal solution x* used for generating Figure 8-c is computed using YALMIP 
toolbox, [23]. Also the worst case total number of backtracking steps for computing step 
sizes was equal to 7, which was also obtained for this instance of the problem. So in 
total we required 2x3x(7 + 3x 14) = 294 steps of message-passing to converge to the 
optimal solution out of which only 42 steps required agents to compute a factorization 
and the rest were computationally trivial. Notice that during the run of this distributed 
algorithm, each agent needed to compute a factorization of a small matrix only 14 times 
and required to communicate with its neighbors 98 times. 

We also tested the performance of the algorithm using a larger flow problem. The 
tree used for describing this problem was of height L = 14 and was generated such that 
all agents, except the ones at the leaves, would have two children. A tree generated in 
this manner then comprises of 2 14+1 — 1 = 32767 nodes and the problem defined on 
this tree has 65534 variables. The parameters that were used for defining the problem 
and that were used in the algorithm were chosen in the same manner as above. For this 


(76b) 


(76c) 
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Figure 8. The corresponding clique tree for the sparsity graph of the flow problem 


problem the primal-dual algorithm converged after 27 iterations and required a total of 
21 backtracking steps. The distributed algorithm hence converged after 2856 steps dur¬ 
ing which each agent required computing 27 factorizations and needed to communicate 
with its neighbors 204 times. 


8. Conclusion 

In this paper we proposed a distributed optimization algorithm based on a primal-dual 
interior-point method. This algorithm can be used for solving loosely coupled problems, 
as defined in Section 6.1. Our proposed algorithm relies solely on second order opti¬ 
mization methods, and hence enjoys superior convergence properties in comparison to 
other existing distributed algorithms for solving loosely coupled problems. Specifically, 
we showed that the algorithm converges to a very high accuracy solution of the problem 
after a finite number of steps that entirely depends on the coupling structure in the 
problem, particularly the length of the clique tree of its corresponding sparsity graph. 


Appendix A. Proof of Theorem 4.2 

We prove this theorem by induction. Firstly, note that for all neighboring agents i and j, 


Moreover 


ViASu 


U Aki \ s ik ) 

fceNe(i)\{j) 


u (Q \ Sn ). 


(Al) 


Ci n (Vfci \ Sik) = 0 v k € Ne(z), (A2) 

and 

(V Zli \ S iZl ) n (V Z2i \ S iZ2 ) = 0 V zi, *2 € Ne(i), 3 i ± z 2 , (A 3 ) 

where (A3) is because the clique tree is assumed to satisfy the clique intersection prop¬ 
erty. These properties can also be verified for the clique tree in Figure 2. For instance 
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let us consider agent 2 for which we have 


V 21 \ S 21 = {1,3,4,6,7,8} \ {1,4} 


u (142 \ S 2k ) 

fc€Ne(2)\{l} 


U (C 2 \ S 21 ) 


— (V42 \ <524) u (V52 \ 525) u (C2 \ S21) 


= ({3,6, 7} \ {3})U({3,8} \ {3})U({1,3,4} \ (1,4}) 
= {6,7}U{8}U{3}, 


(A4) 


where as expected from (A2) and (A3), the three sets making V 21 \ <521 are jointly 
disjoint. 

We start the induction by first showing that (8) holds for all the messages originating 
from the leaves of the tree, i.e., for all i G leaves(T). This follows because for these nodes 
Wij = {*} and hence Vij = C t . Now let us assume that % is a node in the middle of the 
tree with neighbors Ne(i) = {k \,..., k m }, see Figure 3, and that 


m k Ax s 


= minimum 

\i\s ifc 



F t{Xj 


V j = 1, 


, m. 


(A5) 


Then (6) can be rewritten as 


nriij(x s ..) = 


minimum 


E « K >+ 


minimum 

x v kli \s ikl I te<J> 


E \ + ■ 


+ minimum ^ 

Xv k m \s ikm 


E 


>• (A6) 


Note that \ </>* = Ui e Ne(i)\{j} with n $2 * = 22 € Ne W \ U}> z i ¥= z 2 - 

This is guaranteed since each component of the objective function is assigned to only 
one agent. Then by (A2) and (A3) we have 


mn(x a ) = minimum minimum... minimum < 

J x ' t t r 1 

■ L c i \s ij ■ L v kli \s ikl ■ L v kmi \s ikm 


E ^ 


(A7) 


te$i 


Now we can merge all the minimum operators together and, by (Al), rewrite (A7) as 


rriij {x s . .) = minimum < 


V ij \ S ij 


E ^ 


( ’ 


(A8) 


te$i 


which completes the proof. 
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Appendix B. Proof of Theorem 4.3 

Using Theorem 4.2, we can rewrite (9) as 


x = argmm < 


Ea< 

kE(f) r 


x , ) + minimum 


E 


V k 1 r\ S rk 1 I 


X Jt ) > + . 


m k . 


+ minimum \ 

X V krr \S rkr 


E «K> 


> (Bl) 




where we have assumed that Ne(r) = {fci,..., Ay}. Note that N n \(7 r = UfceNe(r) F kr\S r k- 
Then by (A3) we can push the minimum operators together and rewrite (Bl) as 


x = argmm 




k€<f) r 


+ minimum { 

X Kn\Cr 


E 


^ /cGNe(r) tE&kr 



(B2) 


Moreover, since Nat \ <f> r = U/ceWefr) ®kr and that Jk Q C r , we can further sim¬ 

plify (B2) as 


x c = argmm 


minimum 

X N n \C r 




(B3) 


which completes the proof. 
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